R has a full library of tools for working with spatial data. This includes tools for both vector and raster data, as well as interfacing with data from other sources (like ArcGIS) and making maps.
There tons of libraries in the R universe that provide functionality to work with spatial data. You can find an overview of these packages here. In this document, I will introduce four highly popular libraries:
sp: for working with vector datargdal: for importing and exporting vector data from other programsraster: for working with raster datargeos: for working with geometric operations, such as creating buffers, creating geometric intersections, unions of polygons, etc.Moreover, I will point you to a recently written package called sf (for Simple Features). It follows the tidyverse philosphy and a pretty interesting improvement on sp/rgdal/rgeos. It may replace these packages in the future.
# load package for vector data
library(sp)
library(rgdal)
# load package for raster data
library(raster)
# load package for working with geometric options
library(rgeos)
# new package for vector data; will replace sp, rgdal, and rgeos for vector data
library(sf)
# data management and plotting
library(tidyverse)
First, I will introduce you to the two types of spatial data you will encounter in R: vector data and raster data.
Almost all spatial vector data structures in R are based on the sp package. Even other libraries that may seem independent are usually built on top of sp, even if you can’t see it.
The sp package has three main types of spatial data we’ll work with: points, lines, and polygons. There are some differences between these different types, but they are all very similar.
To help you understand what these data structures are like, in this section we’ll create some spatial data from scratch. This is probably not how you’ll work with data in the future – most of the time you just import spatial data from a source – but this exercise will help give you a good foundation and help you know how to troubleshoot problems in the future.
There are three basic steps to creating spatial data by hand:
Spatial* objects (* stands for Points, Lines, or Polygons)
Spatial* object into a Spatial*DataFrame object.Points are the most basic geometric shape, so we begin by building a SpatialPoints object.
A points is defined by a pair of x-y coordiantes, so we can create a set of points by (a) creating matrix of x-y points, and (b) passing them to the SpatialPoints function to create our first SpatialPoints object:
toy.coordinates <- rbind(c(1.5, 2.00),
c(2.5, 2.00),
c(0.5, 0.50),
c(1.0, 0.25),
c(1.5, 0.00),
c(2.0, 0.00),
c(2.5, 0.00),
c(3.0, 0.25),
c(3.5, 0.50))
toy.coordinates
[,1] [,2]
[1,] 1.5 2.00
[2,] 2.5 2.00
[3,] 0.5 0.50
[4,] 1.0 0.25
[5,] 1.5 0.00
[6,] 2.0 0.00
[7,] 2.5 0.00
[8,] 3.0 0.25
[9,] 3.5 0.50
my.first.points <- SpatialPoints(toy.coordinates) # ..converted into a spatial object
plot(my.first.points)
To get a summary of how R sees these points, we can ask it for summary information in a couple different ways. Here’s a summary of available commands:
summary(my.first.points)
Object of class SpatialPoints
Coordinates:
min max
coords.x1 0.5 3.5
coords.x2 0.0 2.0
Is projected: NA
proj4string : [NA]
Number of points: 9
coordinates(my.first.points)
coords.x1 coords.x2
[1,] 1.5 2.00
[2,] 2.5 2.00
[3,] 0.5 0.50
[4,] 1.0 0.25
[5,] 1.5 0.00
[6,] 2.0 0.00
[7,] 2.5 0.00
[8,] 3.0 0.25
[9,] 3.5 0.50
Unlike a simple geometric object, a SpatialPoints object has the ability to keep track of how the coordinates of its points relate to places in the real world through an associated “Coordinate Reference System” (CRS – the combination of a geographic coordinate system and possibly a projection), which is stored using a code called a proj4string. The proj4string is so important to a SpatialPoints object, that it’s presented right in the summary:
summary(my.first.points)
Object of class SpatialPoints
Coordinates:
min max
coords.x1 0.5 3.5
coords.x2 0.0 2.0
Is projected: NA
proj4string : [NA]
Number of points: 9
In this case, however, while our SpatialPoints object clearly knows what a CRS is, the Spatial object we just created does not have a projection or geographic coordinate system defined. It is ok to plot, but be aware that for many meaningful spatial operations you will need to define a CRS.
CRS objects can be created by passing the CRS() function the code associated with a known projection. You can find the codes for most commonly used projections from www.spatialreference.org.
Note that the same CRS can often be called in many ways. For example, one of the most commonly used CRS is the WGS84 latitude-longitude projection. You can create a WGS84 lat-long projection object by either passing the reference code for the projection – CRS("+init=epsg:4326") – or by directly calling all its relevant parameters – CRS("+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs"). Your choice of CRS ususally depends on when the data was collected, the geographic extent of the data, and the purpose of the data. The previous example – WGS84 (EPSG: 4326) Latitude/Longitude – is commonly used for data for the entire globe or many countries.
Here’s an illustration of assigning a CRS:
is.projected(my.first.points) # see if a projection is defined.
[1] NA
# Returns `NA` if no geographic coordinate system or projection; returns FALSE if has geographic coordinate system but no projection.
crs.geo <- CRS("+init=epsg:32633") # UTM 33N, UTM projection for these coordinates
proj4string(my.first.points) <- crs.geo # define projection system of our data
is.projected(my.first.points)
[1] TRUE
summary(my.first.points)
Object of class SpatialPoints
Coordinates:
min max
coords.x1 0.5 3.5
coords.x2 0.0 2.0
Is projected: TRUE
proj4string :
[+init=epsg:32633 +proj=utm +zone=33 +datum=WGS84 +units=m +no_defs
+ellps=WGS84 +towgs84=0,0,0]
Number of points: 9
When CRS is called with only an EPSG code, R will try to complete the CRS with the parameters looked up in the EPSG table.
Geometry-only objects (objects without attributes) can be subsetted similar to how vectors or lists are subsetted; we can select the first two points by
my.first.points[1:2]
class : SpatialPoints
features : 2
extent : 1.5, 2.5, 2, 2 (xmin, xmax, ymin, ymax)
coord. ref. : +init=epsg:32633 +proj=utm +zone=33 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0
Moving from a SpatialPoints to a SpatialPointsDataFrame occurs when you add a data.frame of attributes to the points. Let’s just add an arbitrary table to the data – this will label each point with a letter and a number. Note points will merge with the data.frame based on the order of observations.
df <- tibble(attr1 = c('a','b','z','d','e','q','w','r','z'), attr2 = c(101:109))
df
# A tibble: 9 x 2
attr1 attr2
<chr> <int>
1 a 101
2 b 102
3 z 103
4 d 104
5 e 105
6 q 106
7 w 107
8 r 108
9 z 109
my.first.spdf <- SpatialPointsDataFrame(my.first.points, df)
summary(my.first.spdf)
Object of class SpatialPointsDataFrame
Coordinates:
min max
coords.x1 0.5 3.5
coords.x2 0.0 2.0
Is projected: TRUE
proj4string :
[+init=epsg:32633 +proj=utm +zone=33 +datum=WGS84 +units=m +no_defs
+ellps=WGS84 +towgs84=0,0,0]
Number of points: 9
Data attributes:
attr1 attr2
Length:9 Min. :101
Class :character 1st Qu.:103
Mode :character Median :105
Mean :105
3rd Qu.:107
Max. :109
Now that we have attributes, we can also subset our data the same way we would subset a data.frame. Some subsetting:
my.first.spdf[1:2, ] # row 1 and 2 only
class : SpatialPointsDataFrame
features : 2
extent : 1.5, 2.5, 2, 2 (xmin, xmax, ymin, ymax)
coord. ref. : +init=epsg:32633 +proj=utm +zone=33 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0
variables : 2
names : attr1, attr2
min values : a, 101
max values : b, 102
my.first.spdf[1:2, "attr1"] # row 1 and 2 only, attr1
class : SpatialPointsDataFrame
features : 2
extent : 1.5, 2.5, 2, 2 (xmin, xmax, ymin, ymax)
coord. ref. : +init=epsg:32633 +proj=utm +zone=33 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0
variables : 1
names : attr1
min values : a
max values : b
plot(my.first.spdf[which(my.first.spdf$attr2 > 105),]) # select if attr2 > 5
A SpatialPointsDataFrame object can be created directly from a data.frame by specifying which columns contain the coordinates. This is interesting, for example if you have a spreadsheet that contains latitude, longitude and some values. You can create the object from the data frame in one step by using the coordinates() command. That automatically turns the dataframe object into a SpatialPointsDataFrame.
df_with_coords <- tibble(
lon = toy.coordinates[,1],
lat = toy.coordinates[,2],
attr1 = c('a','b','z','d','e','q','w','r','z'),
attr2 = c(101:109))
my.second.spdf <- SpatialPointsDataFrame(
coordinates(select(df_with_coords, lon, lat)), select(df_with_coords, -(lon:lat)))
summary(my.second.spdf)
Object of class SpatialPointsDataFrame
Coordinates:
min max
lon 0.5 3.5
lat 0.0 2.0
Is projected: NA
proj4string : [NA]
Number of points: 9
Data attributes:
attr1 attr2
Length:9 Min. :101
Class :character 1st Qu.:103
Mode :character Median :105
Mean :105
3rd Qu.:107
Max. :109
SpatialPolygons are very, very common (think administrative borders, countries, etc.), so they’re important to get used to.
SpatialPolygons from scratch.SpatialPolygons are a little more complicated than SpatialPoints. With SpatialPoints, we moved directly from x-y coordinates to a SpatialPoints object.
To get a SpatialPolygons object, we have to build it up by (a) creating Polygon objects, (b) combining those into Polygons objects (note the “s” at the end), and finally (c) combining those to create SpatialPolygons. So what are these components?
Polygon object is a single geometric shape (e.g. a square, rectangle, etc.) defined by a single uninterrupted line around the exterior of the shape.Polygons object consists of one or more simple geometric objects (Polygon objects) that combine to form what you think of as a single unit of analysis (an “observation”). For example, each island in Hawaii would be a Polygon, but Hawaii itself is the Polygons consisting of all the individual island Polygon objects.SpatialPolygons object is a collection of Polygons objects, where each Polygons object is an “observation”. For example, if we wanted a map of US states, we would make a Polygons object for each state, then combine them all into a single SpatialPolygons. If you’re familiar with shapefiles, SpatialPolygons is basically the R analogue of a shapefile or layer.One special note: if you want to put a hole in a polygon (e.g. drawing a donut, or if you wanted to draw South Africa and leave a hole in the middle for Lesotho) you do so by (a) creating a Polygon object for the outline, (b) creating a second Polygon object for the hole and passing the argument hole=TRUE, and (c) combine the two into a Polygons object.
Let’s try building up a SpatialPolygon!
# create polyon objects from coordinates.
# Each object is a single geometric polygon defined by a bounding line.
house1.building <- Polygon(rbind(c(1, 1),
c(2, 1),
c(2, 0),
c(1, 0)))
house1.roof <- Polygon(rbind(c(1.0, 1),
c(1.5, 2),
c(2.0, 1)))
house2.building <- Polygon(rbind(c(3, 1),
c(4, 1),
c(4, 0),
c(3, 0)))
house2.roof <- Polygon(rbind(c(3.0, 1),
c(3.5, 2),
c(4.0, 1)))
house2.door <- Polygon(rbind(c(3.25,0.75),
c(3.75,0.75),
c(3.75,0.00),
c(3.25,0.00)),
hole=TRUE)
# create lists of polygon objects from polygon objects and unique ID
# A `Polygons` is like a single observation.
h1 <- Polygons(list(house1.building, house1.roof), "house1")
h2 <- Polygons(list(house2.building, house2.roof, house2.door), "house2")
# create spatial polygons object from lists
# A SpatialPolygons is like a shapefile or layer.
houses <- SpatialPolygons(list(h1,h2))
plot(houses)
As with SpatialPoints, we can associated a data.frame with SpatialPolygons. There are two things that are important about doing this with SpatialPolygons:
data.frame with a SpatialPolygons object, R will line up rows and polygons by matching Polygons object names with the data.frame row.names.SpatialPolygonsDataFrame’s life, the association between Polygons and rows of your data.frame is based on the order of rows in your data.frame, so don’t try to change the order of the data.frame by hand!Make attributes and plot. Note how the door – which we created with hole=TRUE is empty!
attr <- data.frame(attr1=1:2, attr2=6:5, row.names=c("house2", "house1"))
houses.DF <- SpatialPolygonsDataFrame(houses, attr)
as.data.frame(houses.DF) # Notice the rows were re-ordered!
attr1 attr2
house1 2 5
house2 1 6
spplot(houses.DF)
As with SpatialPoints, a SpatialPolygons object on Earth doesn’t actually know where it until you set its Coordinate Reference System, which you can do the same way you did with the SpatialPoints objects:
crs.geo <- CRS("+init=EPSG:4326") # geographical, datum WGS84
proj4string(houses.DF) <- crs.geo # define projection system of our data
SpatialLines objects are basically like SpatialPolygons, except they’re built up using Line objects (each a single continuous line, like each branch of a river), Lines objects (collection of lines that form a single unit of analysis, like all the parts of a river), to SpatialLines (collection of “observations”, like rivers).
Here’s a quick summary of the construction flow of SpatialObjects:
DEM reproject
Turns out Spatial* objects are just a collection of things that you can see using the str command:
str(my.first.spdf)
Formal class 'SpatialPointsDataFrame' [package "sp"] with 5 slots
..@ data :Classes 'tbl_df', 'tbl' and 'data.frame': 9 obs. of 2 variables:
.. ..$ attr1: chr [1:9] "a" "b" "z" "d" ...
.. ..$ attr2: int [1:9] 101 102 103 104 105 106 107 108 109
..@ coords.nrs : num(0)
..@ coords : num [1:9, 1:2] 1.5 2.5 0.5 1 1.5 2 2.5 3 3.5 2 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : NULL
.. .. ..$ : chr [1:2] "coords.x1" "coords.x2"
..@ bbox : num [1:2, 1:2] 0.5 0 3.5 2
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:2] "coords.x1" "coords.x2"
.. .. ..$ : chr [1:2] "min" "max"
..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slot
.. .. ..@ projargs: chr "+init=epsg:32633 +proj=utm +zone=33 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0"
What this shows is that my.first.spdf is actually a collection of different things – data, coords, bbox, and a proj4string. In the same way we can get many of these things with commands, we can also call them with the @ command!
bbox(my.first.spdf)
min max
coords.x1 0.5 3.5
coords.x2 0.0 2.0
my.first.spdf@bbox
min max
coords.x1 0.5 3.5
coords.x2 0.0 2.0
coordinates(my.first.spdf)
coords.x1 coords.x2
[1,] 1.5 2.00
[2,] 2.5 2.00
[3,] 0.5 0.50
[4,] 1.0 0.25
[5,] 1.5 0.00
[6,] 2.0 0.00
[7,] 2.5 0.00
[8,] 3.0 0.25
[9,] 3.5 0.50
my.first.spdf@coords
coords.x1 coords.x2
[1,] 1.5 2.00
[2,] 2.5 2.00
[3,] 0.5 0.50
[4,] 1.0 0.25
[5,] 1.5 0.00
[6,] 2.0 0.00
[7,] 2.5 0.00
[8,] 3.0 0.25
[9,] 3.5 0.50
as.data.frame(my.first.spdf)
attr1 attr2 coords.x1 coords.x2
1 a 101 1.5 2.00
2 b 102 2.5 2.00
3 z 103 0.5 0.50
4 d 104 1.0 0.25
5 e 105 1.5 0.00
6 q 106 2.0 0.00
7 w 107 2.5 0.00
8 r 108 3.0 0.25
9 z 109 3.5 0.50
my.first.spdf@data
# A tibble: 9 x 2
attr1 attr2
<chr> <int>
1 a 101
2 b 102
3 z 103
4 d 104
5 e 105
6 q 106
7 w 107
8 r 108
9 z 109
In R, the items with @ at the start of the line are called “slots”. If you are used to working in another object-oriented language like Java or Python, these are analogous to attributes. You can access each of these “slots” using the @ operator, making them equivalent of some function calls. It is a general recommendation to not modify slots by directly assigning values to them unless you know very well what you do.
Rasters are much more compact than vectors. Because of their regular structure the coordinates do not need to be recorded for each pixel or cell in the rectangular extent. A raster has a CRS, an origin, a distance or cell size in each direction, a dimension in terms of numbers of cells, and an array of values. If necessary, the coordinates for any cell can be computed.
Note that the sp library used for vector data does have some basic tools for manipulating raster data. However, the sp library has largely been replaced by the raster library we will use here, and anything one can do with the sp library can also be done with the raster library.
A raster dataset has three primary components:
It’s relatively easy to start a raster object by just defining the grid:
basic_raster <- raster(ncol=5, nrow=10, xmn=0, xmx=5, ymn=0, ymx=10)
basic_raster
class : RasterLayer
dimensions : 10, 5, 50 (nrow, ncol, ncell)
resolution : 1, 1 (x, y)
extent : 0, 5, 0, 10 (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
However, note that this raster has a grid, but no data:
hasValues(basic_raster)
[1] FALSE
We add data to the raster object using the values function:
values(basic_raster) <- 1:50 # Note 50 is the total number of cells in the grid.
plot(basic_raster)
Note even though a grid is a 2-dimensional object, raster looks for data that is one-dimensional, then assigns the values in the DataFrame by (a) starting in the top left cell, then (b) moving across the row from left to right, and finally (c) moving down a row and repeating the whole process. Thus each column must be the length of the total number of cells.
To define a projection, we use the same proj4 strings as vector data, but without the intermediate step of creating a CRS object:
projection(basic_raster) <- "+init=EPSG:4326"
rgdal package for Vector dataraster package for Raster dataNormally we do not create Spatial* objects by hand. It is much more common to work with existing data read from external sources like shapefiles, or databases.
In order to read those into R and turn them into Spatial* family objects we rely on the rgdal package. It provides us direct access to the powerful GDAL library from within R.
We can read in and write out spatial data using:
readOGR() and writeOGR()
The parameters provided for each function varies depending on the exact spatial file type you are reading. We will take the ESRI shapefile as an example. A shapefile consists of various files, and R expects all those files to be in one directory.
When reading in a shapefile, readOGR() expects at least the following two arguments:
datasource name (dsn) # the path to the folder that contains the files
# Note that this is a path to the folder
layer name (layer) # the file name without extension
# Note that this is not a path but just the name of the file
For example, if I have a shapefile called sf_incomes.shp and all its associated files (like .dbf, .prj, .shx) in a directory called RGIS1_Data/shapefiles/ in my project folder, my command to read this shapefile would look like this:
my_shapefile <- readOGR(dsn = "RGIS1_Data/shapefiles/", layer = "sf_incomes")
OGR data source with driver: ESRI Shapefile
Source: "RGIS1_Data/shapefiles/", layer: "sf_incomes"
with 182 features
It has 3 fields
Integer64 fields read as strings: Trc2000
plot(my_shapefile)
Note you may run into trouble if you set your working directory to the folder holding the shapefile as ‘readOGR()’ doesn’t like it if the first argument is an empty string.
The raster library can also read many file types on it’s own. For example, let’s load SF Elevation data.
my_raster_file <- raster("RGIS1_Data/sanfrancisconorth.dem")
plot(my_raster_file)
This section provides an overview of tools for geocoding – converting addresses or names of locations into latitudes and longitudes – using the google maps API.
Google offers a service that allows users to submit requests for the latitudes and longitudes associated with different addresses or place names from within R and to get back results that are easy to work within R. This service is called a google geocoding API.
Basically, the google maps API will accept any query you could type into Google Maps and returns information on Google’s best guess for the latitude and longitude associated with your query. The tool for doing this from within R is found int he ggmap library, and the basic syntax is as follows:
library(ggmap)
addresses <- c("1600 Pennsylvania NW, Washington, DC", "denver, co")
locations <- geocode(addresses, source="google", output = "more")
locations
lon lat type loctype
1 -77.03657 38.89766 establishment rooftop
2 -104.99025 39.73924 locality approximate
address north south
1 1600 pennsylvania ave nw, washington, dc 20500, usa 38.89896 38.89626
2 denver, co, usa 39.91425 39.61443
east west route street_number
1 -77.03539 -77.03808 Pennsylvania Avenue Northwest 1600
2 -104.60030 -105.10993 <NA> <NA>
neighborhood locality administrative_area_level_1
1 Northwest Washington Washington District of Columbia
2 <NA> Denver Colorado
country postal_code administrative_area_level_2
1 United States 20500 <NA>
2 United States <NA> Denver County
Note the output option can be set to “latlon”, “latlona”, “more”, or “all” depending on how much information you want back. I would STRONGLY recommend always using the output="more" option so that you get information on how certain google is about its guess!
geocode resultsGeocoding results include two fields that are very important to understand: loctype and type.
Google thinks of the world as containing a number of different types of locations. Some are points (like houses), while others are areas (like cities). The type field tells you if google is giving you the location of a house, or just the centroid of a city. A full list of different types is available from google here, but the most common results (in my experience) are:
street_address: indicates a precise street addresslocality: indicates an incorporated city or town political entitypoint_of_interest: indicates a named point of interest. Typically, these “POI”s are prominent local entities that don’t easily fit in another category, such as “Empire State Building” or “Statue of Liberty.”administrative_area_level_[SOME NUMBER]: these “civil entities”, where 0 is the country, 1 is the first administrative level below that (in the US, states), 2 is below that (in the US, counties), etc.This is important because if you get a locality or administrative area, the latitude and longitude you get it just the centroid of the locality, and you should interpret it as such!
The loctype column provides similar but distinct information to type, including more information about street_address results. In order of precision, the possible values for this field are:
Google limits individual (free) users to 2,500 queries per day and 10 queries per second. You can see how many queries you have remaining by typing geocodeQueryCheck(). You can also buy additional requests (up to 100,000 a day) for $0.50 / 1000 requests, and there is a paid subscription service that will provide up to 100,000 requests a day.
Because of this, it is important that you not test your code on your entire dataset or you’ll waste your queries when you’re debugging your code!
ggplot2library(tidyverse)
library(sp)
library(rgdal)
library(rgeos)
library(raster)
library(viridis)
library(gtable)
library(grid)
# load data: average age in CH
data <- read.csv("input_grossenbacher/avg_age_15.csv", stringsAsFactors = F)
# load shapefile
gde_15 <- readOGR("input_grossenbacher/geodata/gde-1-1-15.shp", layer = "gde-1-1-15")
OGR data source with driver: ESRI Shapefile
Source: "input_grossenbacher/geodata/gde-1-1-15.shp", layer: "gde-1-1-15"
with 2324 features
It has 2 fields
Integer64 fields read as strings: BFS_ID
# project it
# set crs to ch1903/lv03, just to make sure (EPSG:21781)
crs(gde_15) <- "+proj=somerc +lat_0=46.95240555555556
+lon_0=7.439583333333333 +k_0=1 +x_0=600000 +y_0=200000
+ellps=bessel +towgs84=674.374,15.056,405.346,0,0,0,0 +units=m +no_defs"
# fortify, i.e., make ggplot2-compatible
# fortify SpatialPolygonsDataFrame to read it in ggplot
map_data_fortified <- fortify(gde_15, region = "BFS_ID") %>%
mutate(id = as.numeric(id))
# join fortified SpatialPolygonsDataFrame with data
map_data <- map_data_fortified %>% left_join(data, by = c("id" = "bfs_id"))
# load municipalities shapefile
gde_15_political <- readOGR("input_grossenbacher/geodata/g1g15.shp", layer = "g1g15")
OGR data source with driver: ESRI Shapefile
Source: "input_grossenbacher/geodata/g1g15.shp", layer: "g1g15"
with 2328 features
It has 20 fields
# project it
crs(gde_15_political) <- "+proj=somerc +lat_0=46.95240555555556
+lon_0=7.439583333333333 +k_0=1 +x_0=600000 +y_0=200000
+ellps=bessel +towgs84=674.374,15.056,405.346,0,0,0,0 +units=m +no_defs"
# fortify SpatialPolygonsDataFrame to read it in ggplot
map_data_political_fortified <- fortify(gde_15_political, region = "GMDNR") %>%
mutate(id = as.numeric(id))
# join fortified SpatialPolygonsDataFrame with data
map_data_political <- map_data_political_fortified %>% left_join(data, by = c("id" = "bfs_id"))
map_data_political <- map_data_political[complete.cases(map_data_political),]
# read in background relief raster
relief <- raster("input_grossenbacher/geodata/02-relief-georef-clipped-resampled.tif")
relief_spdf <- as(relief, "SpatialPixelsDataFrame")
# relief is converted to a very simple data frame,
# just as the fortified municipalities.
# for that we need to convert it to a
# SpatialPixelsDataFrame first, and then extract its contents
# using as.data.frame
relief <- as.data.frame(relief_spdf) %>%
rename(value = `X02.relief.georef.clipped.resampled`)
# remove unnecessary variables
rm(relief_spdf)
rm(gde_15)
rm(map_data_fortified)
rm(map_data_political_fortified)
# customize theme
theme_map <- function(...) {
theme_minimal() +
theme(
text = element_text(family = "Avenir Next Medium", color = "#22211d"),
axis.line = element_blank(),
axis.text.x = element_blank(),
axis.text.y = element_blank(),
axis.ticks = element_blank(),
axis.title.x = element_blank(),
axis.title.y = element_blank(),
# panel.grid.minor = element_line(color = "#ebebe5", size = 0.2),
panel.grid.major = element_line(color = "#ebebe5", size = 0.2),
panel.grid.minor = element_blank(),
plot.background = element_rect(fill = "#f5f5f2", color = NA),
panel.background = element_rect(fill = "#f5f5f2", color = NA),
legend.background = element_rect(fill = "#f5f5f2", color = NA),
panel.border = element_blank(),
...
)
}
# customize legend
extendLegendWithExtremes <- function(p){
p_grob <- ggplotGrob(p)
legend <- gtable_filter(p_grob, "guide-box")
legend_grobs <- legend$grobs[[1]]$grobs[[1]]
# grab the first key of legend
legend_first_key <- gtable_filter(legend_grobs, "key-3-1-1")
legend_first_key$widths <- unit(2, units = "cm")
# modify its width and x properties to make it longer
legend_first_key$grobs[[1]]$width <- unit(2, units = "cm")
legend_first_key$grobs[[1]]$x <- unit(0.15, units = "cm")
# last key of legend
legend_last_key <- gtable_filter(legend_grobs, "key-3-6-1")
legend_last_key$widths <- unit(2, units = "cm")
# analogous
legend_last_key$grobs[[1]]$width <- unit(2, units = "cm")
legend_last_key$grobs[[1]]$x <- unit(1.02, units = "cm")
# grab the last label so we can also shift its position
legend_last_label <- gtable_filter(legend_grobs, "label-5-6")
legend_last_label$grobs[[1]]$x <- unit(2, units = "cm")
# Insert new color legend back into the combined legend
legend_grobs$grobs[legend_grobs$layout$name == "key-3-1-1"][[1]] <-
legend_first_key$grobs[[1]]
legend_grobs$grobs[legend_grobs$layout$name == "key-3-6-1"][[1]] <-
legend_last_key$grobs[[1]]
legend_grobs$grobs[legend_grobs$layout$name == "label-5-6"][[1]] <-
legend_last_label$grobs[[1]]
# finally, I need to create a new label for the minimum value
new_first_label <- legend_last_label$grobs[[1]]
new_first_label$label <- round(min(map_data$avg_age_15, na.rm = T), 2)
new_first_label$x <- unit(-0.15, units = "cm")
new_first_label$hjust <- 1
legend_grobs <- gtable_add_grob(legend_grobs,
new_first_label,
t = 6,
l = 2,
name = "label-5-0",
clip = "off")
legend$grobs[[1]]$grobs[1][[1]] <- legend_grobs
p_grob$grobs[p_grob$layout$name == "guide-box"][[1]] <- legend
# the plot is now drawn using this grid function
grid.newpage()
grid.draw(p_grob)
}
# same code as above but different breaks
pretty_breaks <- c(40,42,44,46,48)
# find the extremes
minVal <- min(map_data$avg_age_15, na.rm = T)
maxVal <- max(map_data$avg_age_15, na.rm = T)
# compute labels
labels <- c()
brks <- c(minVal, pretty_breaks, maxVal)
# round the labels (actually, only the extremes)
for(idx in 1:length(brks)){
labels <- c(labels,round(brks[idx + 1], 2))
}
labels <- labels[1:length(labels)-1]
# define a new variable on the data set just as above
map_data$brks <- cut(map_data$avg_age_15,
breaks = brks,
include.lowest = TRUE,
labels = labels)
brks_scale <- levels(map_data$brks)
labels_scale <- rev(brks_scale)
# plot it
p <- ggplot() +
# municipality polygons
geom_raster(data = relief, aes_string(x = "x",
y = "y",
alpha = "value")) +
scale_alpha(name = "", range = c(0.6, 0), guide = F) +
geom_polygon(data = map_data, aes(fill = brks,
x = long,
y = lat,
group = group)) +
# municipality outline
geom_path(data = map_data, aes(x = long,
y = lat,
group = group),
color = "white", size = 0.1) +
coord_equal() +
theme_map() +
theme(
legend.position = c(0.5, 0.03),
legend.text.align = 0,
legend.background = element_rect(fill = alpha('white', 0.0)),
legend.text = element_text(size = 7, hjust = 0, color = "#4e4d47"),
plot.title = element_text(hjust = 0.5, color = "#4e4d47"),
plot.subtitle = element_text(hjust = 0.5, color = "#4e4d47",
margin = margin(b = -0.1,
t = -0.1,
l = 2,
unit = "cm"),
debug = F),
legend.title = element_text(size = 8),
plot.margin = unit(c(.5,.5,.2,.5), "cm"),
panel.spacing = unit(c(-.1,0.2,.2,0.2), "cm"),
panel.border = element_blank(),
plot.caption = element_text(size = 6,
hjust = 0.92,
margin = margin(t = 0.2,
b = 0,
unit = "cm"),
color = "#939184")
) +
labs(x = NULL,
y = NULL,
title = "Switzerland's regional demographics",
subtitle = "Average age in Swiss municipalities, 2015",
caption = "Map CC-BY-SA; Author: Timo Grossenbacher (@grssnbchr), Geometries: ThemaKart, BFS; Data: BFS, 2016; Relief: swisstopo, 2016") +
scale_fill_manual(
values = rev(magma(8, alpha = 0.8)[2:7]),
breaks = rev(brks_scale),
name = "Average age",
drop = FALSE,
labels = labels_scale,
guide = guide_legend(
direction = "horizontal",
keyheight = unit(2, units = "mm"),
keywidth = unit(70/length(labels), units = "mm"),
title.position = 'top',
title.hjust = 0.5,
label.hjust = 1,
nrow = 1,
byrow = T,
reverse = T,
label.position = "bottom"
)
)
extendLegendWithExtremes(p)
library(rgdal)
library(ggplot2)
palo_alto <- readOGR("RGIS3_Data/palo_alto", "palo_alto")
OGR data source with driver: ESRI Shapefile
Source: "RGIS3_Data/palo_alto", layer: "palo_alto"
with 371 features
It has 5 fields
# create a unique ID for the later join
palo_alto$id = rownames(as.data.frame(palo_alto))
# turn SpatialPolygonsDataframe into a data frame using fortify
palo_alto.pts <- fortify(palo_alto, region="id") #this only has the coordinates
palo_alto.df <- merge(palo_alto.pts, palo_alto, by="id", type='left') # add the attributes back
# calculate even breaks
palo_alto.df$qt <- cut(palo_alto.df$PrCpInc, 6)
# plot
ggplot(palo_alto.df, aes(long, lat, group = group, fill = qt)) +
geom_polygon() +
scale_fill_brewer("Per Cap Income", palette = "OrRd", labels = c('0 to 26,800',
'26,800 to 43,900',
'43,900 to 60,900',
'60,900 to 78,000',
'78,000 to 95,100',
'95,100 to 112,000')) +
theme(line = element_blank(),
axis.text=element_blank(),
axis.title=element_blank(),
panel.background = element_blank()) +
coord_equal()
library(tmap)
tm_shape(palo_alto) +
tm_fill("PrCpInc", palette = 'OrRd')
library(ggmap)
get_map('university zurich') %>% ggmap
get_map('university zurich', zoom = 15) %>% ggmap
europe <- c(left = -12, bottom = 35, right = 30, top = 63)
get_map(europe, zoom = 5) %>% ggmap
get_stamenmap(europe, zoom = 5, maptype = "toner-lite") %>% ggmap()
my_data <- tibble(
cities = c('Barcelona', 'London', 'Dublin', 'Zürich', 'Paris', 'Berlin'),
values = c(50, 20, 70, 10, 40, 90)
)
locations <- geocode(my_data$cities, source = 'google', output = 'more')
left_join(my_data, locations, by = c('cities' = 'locality')) %>%
select(cities:lat) -> my_data
get_stamenmap(europe, zoom = 5, maptype = "toner-lite") %>%
ggmap() +
geom_point(data = my_data, aes(x = lon, y = lat, size = values), alpha = 0.5)
sf packageIn a few words, here’s what is interesting about sf:
sf is a data.frame – which means that a lot of data analysis methods are readily available,sp, rgdal, and rgeos under one unique package,rgdal,sp and rgdal — the upcoming version will include spatial indexing!The trick of sf is that spatial is not that special anymore: it describes spatial attributes as geometries, which is just another attribute of the dataset.
What was SpatialPoints, SpatialPolgons, etc before will now be: MULTIPOINT, MULTIPOLYGON, etc.
library(sf)
nc <- st_read(system.file("shape/nc.shp", package="sf"))
Reading layer `nc' from data source `/Library/Frameworks/R.framework/Versions/3.4/Resources/library/sf/shape/nc.shp' using driver `ESRI Shapefile'
Simple feature collection with 100 features and 14 fields
geometry type: MULTIPOLYGON
dimension: XY
bbox: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
epsg (SRID): 4267
proj4string: +proj=longlat +datum=NAD27 +no_defs
str(nc)
Classes 'sf' and 'data.frame': 100 obs. of 15 variables:
$ AREA : num 0.114 0.061 0.143 0.07 0.153 0.097 0.062 0.091 0.118 0.124 ...
$ PERIMETER: num 1.44 1.23 1.63 2.97 2.21 ...
$ CNTY_ : num 1825 1827 1828 1831 1832 ...
$ CNTY_ID : num 1825 1827 1828 1831 1832 ...
$ NAME : Factor w/ 100 levels "Alamance","Alexander",..: 5 3 86 27 66 46 15 37 93 85 ...
$ FIPS : Factor w/ 100 levels "37001","37003",..: 5 3 86 27 66 46 15 37 93 85 ...
$ FIPSNO : num 37009 37005 37171 37053 37131 ...
$ CRESS_ID : int 5 3 86 27 66 46 15 37 93 85 ...
$ BIR74 : num 1091 487 3188 508 1421 ...
$ SID74 : num 1 0 5 1 9 7 0 0 4 1 ...
$ NWBIR74 : num 10 10 208 123 1066 ...
$ BIR79 : num 1364 542 3616 830 1606 ...
$ SID79 : num 0 3 6 2 3 5 2 2 2 5 ...
$ NWBIR79 : num 19 12 260 145 1197 ...
$ geometry :sfc_MULTIPOLYGON of length 100; first list element: List of 1
..$ :List of 1
.. ..$ : num [1:27, 1:2] -81.5 -81.5 -81.6 -81.6 -81.7 ...
..- attr(*, "class")= chr "XY" "MULTIPOLYGON" "sfg"
- attr(*, "sf_column")= chr "geometry"
- attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA ...
..- attr(*, "names")= chr "AREA" "PERIMETER" "CNTY_" "CNTY_ID" ...
print(nc[9:15], n = 3)
Simple feature collection with 100 features and 6 fields
geometry type: MULTIPOLYGON
dimension: XY
bbox: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
epsg (SRID): 4267
proj4string: +proj=longlat +datum=NAD27 +no_defs
First 3 features:
BIR74 SID74 NWBIR74 BIR79 SID79 NWBIR79 geometry
1 1091 1 10 1364 0 19 MULTIPOLYGON (((-81.4727554...
2 487 0 10 542 3 12 MULTIPOLYGON (((-81.2398910...
3 3188 5 208 3616 6 260 MULTIPOLYGON (((-80.4563446...
The functions provided by sf are prefixed by st_. That makes it easy to search for them on the command line too.
You can load spatial data from txt or csv files using the familiar dplyr commands,
df <- read_csv('./path/to/my/file.csv')
Let’s take the example dataset meuse from the sp package. meuse is a data.frame — similar to what could have been read from a CSV file using read_csv:
data('meuse', package = "sp")
head(meuse)
x y cadmium copper lead zinc elev dist om ffreq soil
1 181072 333611 11.7 85 299 1022 7.909 0.00135803 13.6 1 1
2 181025 333558 8.6 81 277 1141 6.983 0.01222430 14.0 1 1
3 181165 333537 6.5 68 199 640 7.800 0.10302900 13.0 1 1
4 181298 333484 2.6 81 116 257 7.655 0.19009400 8.0 1 2
5 181307 333330 2.8 48 117 269 7.480 0.27709000 8.7 1 2
lime landuse dist.m
1 1 Ah 50
2 1 Ah 30
3 1 Ah 150
4 0 Ga 270
5 0 Ah 380
[ reached getOption("max.print") -- omitted 1 row ]
# what was
# SpatialPointsDataFrame(
# coordinates(select(meuse, x, y)), select(meuse, -(x:y)))
ms <- st_as_sf(
meuse,
coords = c('x', 'y'),
crs = "+init=epsg:28992"
)
ms
Simple feature collection with 155 features and 12 fields
geometry type: POINT
dimension: XY
bbox: xmin: 178605 ymin: 329714 xmax: 181390 ymax: 333611
epsg (SRID): 28992
proj4string: +proj=sterea +lat_0=52.15616055555555 +lon_0=5.38763888888889 +k=0.9999079 +x_0=155000 +y_0=463000 +ellps=bessel +towgs84=565.4171,50.3319,465.5524,-0.398957,0.343988,-1.8774,4.0725 +units=m +no_defs
First 75 features:
cadmium copper lead zinc elev dist om ffreq soil lime landuse
1 11.7 85 299 1022 7.909 0.00135803 13.6 1 1 1 Ah
2 8.6 81 277 1141 6.983 0.01222430 14.0 1 1 1 Ah
3 6.5 68 199 640 7.800 0.10302900 13.0 1 1 1 Ah
4 2.6 81 116 257 7.655 0.19009400 8.0 1 2 0 Ga
5 2.8 48 117 269 7.480 0.27709000 8.7 1 2 0 Ah
dist.m geometry
1 50 POINT (181072 333611)
2 30 POINT (181025 333558)
3 150 POINT (181165 333537)
4 270 POINT (181298 333484)
5 380 POINT (181307 333330)
[ reached getOption("max.print") -- omitted 70 rows ]
There is a simple plotting funtion included in sf. It is very similar to the old sp::spplot:
plot(ms)
# what would have been before readOGR()
file_name <- system.file("shape/nc.shp", package = "sf")
nc <- st_read(file_name)
Reading layer `nc' from data source `/Library/Frameworks/R.framework/Versions/3.4/Resources/library/sf/shape/nc.shp' using driver `ESRI Shapefile'
Simple feature collection with 100 features and 14 fields
geometry type: MULTIPOLYGON
dimension: XY
bbox: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
epsg (SRID): 4267
proj4string: +proj=longlat +datum=NAD27 +no_defs
print(nc)
Simple feature collection with 100 features and 14 fields
geometry type: MULTIPOLYGON
dimension: XY
bbox: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
epsg (SRID): 4267
proj4string: +proj=longlat +datum=NAD27 +no_defs
First 75 features:
AREA PERIMETER CNTY_ CNTY_ID NAME FIPS FIPSNO CRESS_ID BIR74
1 0.114 1.442 1825 1825 Ashe 37009 37009 5 1091
2 0.061 1.231 1827 1827 Alleghany 37005 37005 3 487
3 0.143 1.630 1828 1828 Surry 37171 37171 86 3188
4 0.070 2.968 1831 1831 Currituck 37053 37053 27 508
5 0.153 2.206 1832 1832 Northampton 37131 37131 66 1421
SID74 NWBIR74 BIR79 SID79 NWBIR79 geometry
1 1 10 1364 0 19 MULTIPOLYGON (((-81.4727554...
2 0 10 542 3 12 MULTIPOLYGON (((-81.2398910...
3 5 208 3616 6 260 MULTIPOLYGON (((-80.4563446...
4 1 123 830 2 145 MULTIPOLYGON (((-76.0089721...
5 9 1066 1606 3 1197 MULTIPOLYGON (((-77.2176666...
[ reached getOption("max.print") -- omitted 70 rows ]
plot(nc)
plot(nc['AREA'])
st_write(nc, "output/nc.shp")
Writing layer `nc' to data source `output/nc.shp' using driver `ESRI Shapefile'
features: 100
fields: 14
geometry type: Multi Polygon
st_write(nc, "output/nc.shp", delete_layer = TRUE) # overrides an existing file
Deleting layer `nc' using driver `ESRI Shapefile'
Writing layer `nc' to data source `/Users/econspare/Dropbox/teaching/2017 ProgrammingCamp/20-r-gis/output/nc.shp' using driver `ESRI Shapefile'
features: 100
fields: 14
geometry type: Multi Polygon
write_sf(nc, "output/nc.shp") # same as no.2 with quiet = TRUE, delete_layer = TRUE
non_spatial_df <- data.frame(
CNTY_ID = nc$CNTY_ID,
my_data <- runif(nrow(nc))
)
left_join(nc, non_spatial_df)
Simple feature collection with 100 features and 15 fields
geometry type: MULTIPOLYGON
dimension: XY
bbox: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
epsg (SRID): 4267
proj4string: +proj=longlat +datum=NAD27 +no_defs
First 75 features:
AREA PERIMETER CNTY_ CNTY_ID NAME FIPS FIPSNO CRESS_ID BIR74
1 0.114 1.442 1825 1825 Ashe 37009 37009 5 1091
2 0.061 1.231 1827 1827 Alleghany 37005 37005 3 487
3 0.143 1.630 1828 1828 Surry 37171 37171 86 3188
4 0.070 2.968 1831 1831 Currituck 37053 37053 27 508
SID74 NWBIR74 BIR79 SID79 NWBIR79 my_data....runif.nrow.nc..
1 1 10 1364 0 19 0.30614580
2 0 10 542 3 12 0.63743965
3 5 208 3616 6 260 0.84719127
4 1 123 830 2 145 0.07022662
geometry
1 MULTIPOLYGON (((-81.4727554...
2 MULTIPOLYGON (((-81.2398910...
3 MULTIPOLYGON (((-80.4563446...
4 MULTIPOLYGON (((-76.0089721...
[ reached getOption("max.print") -- omitted 71 rows ]
st_crs(nc)
$epsg
[1] 4267
$proj4string
[1] "+proj=longlat +datum=NAD27 +no_defs"
attr(,"class")
[1] "crs"
st_is_longlat(nc)
[1] TRUE
nc_p <- st_transform(nc, crs = 32119)
st_crs(nc_p)
$epsg
[1] 32119
$proj4string
[1] "+proj=lcc +lat_1=36.16666666666666 +lat_2=34.33333333333334 +lat_0=33.75 +lon_0=-79 +x_0=609601.22 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs"
attr(,"class")
[1] "crs"
# devtools::install_github('tidyverse/ggplot2', force = TRUE)
# install and restart your R session
library(sf)
library(ggplot2)
nc <- st_read(system.file("shape/nc.shp", package = 'sf'), quiet = TRUE)
ggplot(nc) +
geom_sf(aes(fill = AREA))
library(viridis)
ggplot(nc) +
geom_sf(aes(fill = AREA)) +
scale_fill_viridis("Area") +
ggtitle("Area of counties in North Carlolina") +
theme_bw()
library(viridis)
ggplot(nc) +
geom_sf(aes(fill = AREA)) +
scale_fill_viridis("Area") +
coord_sf(crs = st_crs(102003)) +
ggtitle("Area of counties in North Carlolina (Albers Projection") +
theme_bw()
The document heavily draws from the excellent tutorials by Nick Eubank.
Moreover, have a look at these cheatsheets that may help you with learning both GIS concepts and vocabulary: